Statistical Science 

2010, Vol. 25, No. 1, 107-125 

DOI: 10.1214/10-STS326 

© Institute of Mathematical Statistics, 2010 



The Importance of Scale for 
Spatial-Confounding Bias and Precision 
of Spatial Regression Estimators 



Christopher J. Paciorek 



O 

> 
O 



> 
OS 



x 

S3 



Abstract. Residuals in regression models are often spatially correlated. 
Prominent examples include studies in environmental epidemiology to 
understand the chronic health effects of pollutants. I consider the ef- 
fects of residual spatial structure on the bias and precision of regression 
coefficients, developing a simple framework in which to understand the 
key issues and derive informative analytic results. When unmeasured 
confounding introduces spatial structure into the residuals, regression 
models with spatial random effects and closely-related models such as 
kriging and penalized splines are biased, even when the residual vari- 
ance components are known. Analytic and simulation results show how 
the bias depends on the spatial scales of the covariate and the residual: 
one can reduce bias by fitting a spatial model only when there is varia- 
tion in the covariate at a scale smaller than the scale of the unmeasured 
confounding. I also discuss how the scales of the residual and the co- 
variate affect efficiency and uncertainty estimation when the residuals 
are independent of the covariate. In an application on the association 
between black carbon particulate matter air pollution and birth weight, 
controlling for large-scale spatial variation appears to reduce bias from 
unmeasured confounders, while increasing uncertainty in the estimated 
pollution effect. 

Key words and phrases: Epidemiology, identifiability, mixed model, 
penalized likelihood, random effects, spatial correlation, splines. 



1. INTRODUCTION 

Spatial confounding is likely present in many of 
the applied contexts in which residuals are spatially 
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correlated, particularly in public health and social 
science. Consider the motivating example of the 
health effects of exposure to (spatially varying) air 
pollution, an important public health issue. Many 
variables that explain variability in the response, in- 
cluding potential confounding variables that may be 
correlated with exposure, also vary spatially. For ex- 
ample, large-scale regional patterns in air pollution 
may be correlated with regional patterns in diet, in- 
come and other risk factors for a health outcome of 
interest. Small-scale patterns in air pollution from 
local sources may be correlated with risk factors as 
well, for example, if lower-income people live nearer 
to busy roads or industrial sources. If confounding 
variables are not measured, it will be difficult to 
distinguish the effect of air pollution from residual 
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spatial variation in the health outcome. I use the 
term spatial confounding to characterize this situa- 
tion. Researchers have modeled the spatial structure 
in the outcome with the apparent goal of reducing 
confounding bias (e.g., Clayton, Bernardinelli and 
Montomoli, 1993; Pope et al., 2002; Cakmak et al., 
2003; Biggeri et al., 2005). However, the statistical 
mechanism for reducing bias does not appear to be 
well understood nor investigated rigorously in the 
statistical or applied literature. 

To consider the problem formally, start with sim- 
ple linear regression with spatial structure: 

Yi = Po + PxXi + ei, i = l,...,n, 

(1) 

e~A/"(0,£), 

where each outcome, Yi, is associated with a spatial 
location, Sj G 3? 2 . Xi is the corresponding value of a 
univariate regressor of interest, which may also vary 
spatially, in which case we would represent Xi as 
X(si). e = (ei, . . . , e n ) T is the vector of errors, whose 
covariance matrix, X, captures any residual spatial 
correlation, as well as independent variation. The 
regression coefficients, /3 = {/3q, x }, are unknown, 
and estimation of j3 x is of primary interest. Spatial 
statistics and regression texts note that the ordi- 
nary least squares (OLS) estimator for (3 X in this 
setting is unbiased but inefficient, and the usual OLS 
variance estimator is incorrect. Assuming known X, 
the generalized least squares (GLS) estimator is the 
most efficient estimator. However, little appears to 
be known about how the spatial scales of the resid- 
ual variability and of X affect inference. Spatial struc- 
ture in X is very common in applications and com- 
plicates the problem because X and the residual 
spatial structure compete to explain the variability 
in the response (Waller and Gotway, 2004). Further- 
more, it would not be surprising if the spatial cor- 
relation in the residuals were caused by an unmea- 
sured spatially varying confounder; I next introduce 
another representation of (1) to enable exploration 
of confounding. Motivated by the air pollution ex- 
ample, I will refer to X as the "exposure." 

One can obtain the basic spatial regression model 

(1) using a simple mixed model, 

(2) Y i = p + p x X(s i ) + g{si)+e i , 

with random effects, g= (g(s\), . . . , g(s n )) T , and 

white noise errors, M(0, r 2 ). Suppose the ran- 

dom effects are spatially correlated, with g ~ Af(Q, 



dgTl(9g)), where R(0 9 ) is a spatial correlation ma- 
trix parameterized by 6 g , a spatial range parame- 
ter, and cr 2 is the variance of the random effects. 
Marginalizing over g gives the marginal likelihood, 

Y = (Y 1 ,...,Y n ) T 

(3) 

~AA(/3 l + /3xX,a 2 R(^) + r 2 I), 

where 1 is an n-vector of ones, I is the identity 
matrix, and X= (Xi, . . . , X n ) T . Here X in (1) is 
explicitly decomposed into spatial and nonspatial 
components. An alternative formulation would spec- 
ify the unknown spatial function, g(s), ELS 9l penal- 
ized spline, where a penalty parameter plays the role 
of {9g,&g} in the marginal likelihood in penalizing 
complexity of the spatial structure. The exposure 
may itself be spatially correlated. For example, if 
X(s) is a Gaussian process, then X ~ M(0, a^R.{6 x )), 
with parameters analogous to those for g. To demon- 
strate processes operating at different spatial scales, 
Figure 1 shows simulated spatial surfaces as one 
varies the spatial range parameter, 8, in a Gaussian 
process model. 

The spatial statistics literature assumes that the 
error, e, in (1), is independent of the covariate(s) 
(Cressie, 1993; Waller and Gotway, 2004), with lit- 
tle or no discussion of the possibility that the er- 
ror involves variation from unmeasured confounders. 
Henceforth, I will refer to the errors as residuals 
because of the common use of the term "spatial 
residual" to refer to unexplained spatial variabil- 
ity. To explore the possibility of confounding, let's 
consider g = (3 Z Z to be induced as the effect, /3 Z , 
of an unmeasured variable, Z, on the outcome. Z = 
(Z(s\), . . . , Z(s n )) T may also be spatially correlated, 
for example, Z ~ A/"(0, £r 2 R(# z )), such that a z = 
<r 2 //3 2 , where 6 Z is again a spatial range parame- 
ter. If Z and X are dependent, then Z is an unmea- 
sured spatial confounder. Derivation of the marginal 
likelihood should be done by integrating over the 
(unknown) conditional distribution of Z given X, 
whereas the integration leading to (1) ignores the 
dependence. Note that if X and Z are considered 
fixed, then association between X and g = fi z Z is 
known as concurvity (Buja, Hastie and Tibshirani, 
1989; Ramsay, Burnett and Krewski, 2003). 

In the applied literature, practitioners often rec- 
ognize the need to consider residual spatial struc- 
ture in the outcome, with language of "control" or 
"accounting" for autocorrelation, and they fit mod- 
els (such as kriging or spatial random effects) that 
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implicitly assume independence of the residual and 
the exposure (Burnett et al., 2001; Cakmak et al., 
2003; Cho, 2003; Burden et al., 2005; Augustin et al., 
2007; Molitor et al., 2007; Cerda et al., 2009; 
Lee, Ferguson and Mitchell, 2009). With the recent 
exception of Hodges and Reich (2010), formal state- 
ments of the goals and properties of fitting such spa- 
tial models are generally absent. However, much of 
the interest appears to lie in using the spatial resid- 
ual structure to try to account for spatial confound- 
ing, with the implicit assumption that such models 
reduce or eliminate confounding bias 
(e.g., Clayton, Bernardinelli and Montomoli, 1993; 
Pope et al., 2002; Cakmak et al., 2003; Richardson, 
2003; Biggeri et al., 2005). One approach is to ex- 
plicitly consider the spatial scales involved, hoping 
that accounting for variation at a relatively large 
spatial scale allows for identification of the parame- 
ter of interest based on exposure heterogeneity at 
a smaller spatial scale (e.g., Burnett et al., 2001; 
Cakmak et al., 2003; Zeger et al., 2007). This smaller 
scale variation may be less prone to confounding in 
a given application. However, this consideration of 
spatial scale is often not explicit, and effects of scale 
on bias reduction, while sometimes hinted at, have 
not been developed formally. 

In the analogous context of time series modeling 
of air pollution, Dominici, McDermott and Hastie 
(2004) attempt to attribute all the temporally corre- 
lated variability in the outcome to the residual term 
in order to identify the effect of exposure based on 
the temporally uncorrelated (and presumably un- 
confounded) heterogeneity in the exposure. Dominici, 
McDermott and Hastie (2004) provide no guidance 
in the scenario that the exposure cannot be decom- 
posed into autocorrelated and uncorrelated compo- 
nents. This issue also applies to the approach of 



Lombardfa and Sperlich (2007), who filter out the 
dependence between fixed and random effects. In 
the spatial setting, in which measurements cannot 
be made at all locations, accurate estimation of the 
uncorrelated component, if such a component even 
exists, is rare: consider atmospheric phenomena such 
as temperature and air pollution. A common situa- 
tion in which fine-scale heterogeneity is not resolved 
involves prediction of spatially varying exposure val- 
ues using averages of nearby measurements or spa- 
tial smoothing techniques. Hence, I seek to address 
the problem when all of the measured components 
of variation in exposure are spatial. 

In this paper I address estimation in simple regres- 
sion models with spatial residual structure. I focus 
on the properties of penalized models, using a sim- 
ple mixed model fit by GLS, equivalent to universal 
kriging, to analyze the effects of the spatially cor- 
related residual structure on fixed effect estimators. 
Section 2 focuses on bias from spatial confounding. 
I report analytic results when the full covariance 
structure is known and supporting simulations when 
the covariance (or the amount of smoothing in pe- 
nalized spline models) is estimated from the data. 
I assess the use of sensitivity analysis approaches 
based on spline models that explicitly consider the 
bias-variance tradeoff involved in choosing the spa- 
tial scale at which to model the residual variation. 
Section 3 focuses on precision of estimators when 
there is no association between exposure and resid- 
ual (no spatial confounding). I close with a case 
study of the effects of air pollution on birthweight 
(Section 4). 




Fig. 1. Gaussian process realizations using the Matern covariance (see Section 2.2) for three values of 9, with (a) high- 
-frequency, small (fine)-scale variability when 6 — 0.1, (b) moderate scale variability when 6 — 0.5, and (c) low-frequency, 
large-scale variability when 9 = 0.9. 
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2. SPATIAL CONFOUNDING AND BIAS 

2.1 Identifiability 

A key consideration in the basic model (2) is iden- 
tifiability of P x and g(s). A closely-related ques- 
tion is how the estimation procedure attributes vari- 
ability between the exposure and the spatial resid- 
ual term (the random effects). In the simple linear 
model, attribution of variability to the covariates 
rather than the error term is favored because this 
allows the estimate of the error variance to decrease, 
with the normalizing constant of the likelihood fa- 
voring smaller error variance. In the spatial model, if 
the spatial term, g, is unconstrained, then /3 X X and 
g are not identifiable in the likelihood: one could 
remove the covariate from the model and redefine 
g*(s) = f3 x X{s) + g(s) with no change in the likeli- 
hood. Identifiability comes through constraints on 
g, either by (1) penalizing lack of smoothness in 
g(s), (2) considering g to be a random effects term, 
or (3) having a prior on g. These approaches give 
higher penalized likelihood, marginal likelihood or 
posterior density, respectively, when variability is at- 
tributed to the unpenalized fixed effects term rather 
than to the spatial term. In the spatial confounding 
context this dynamic causes bias in estimation of f3 x , 
for example, as seen in the simulations of Peng, Do- 
minici and Louis (2006). An alternative constraint is 
to represent g in a reduced dimension basis, say, as 
a regression spline. In this case the model is identifi- 
able if there is a component of variability in X that 
cannot be explained by the spline structure, that is, 
if X is not perfectly collinear with the columns of 
the chosen basis matrix. 

2.2 Analytic Framework 

To consider bias from unmeasured spatially vary- 
ing confounders, take the following model as the 
data-generating mechanism, 

(4) y i ~AA(/3 + /3x^(s l )+/3 2 Z(s i ),r 2 ), 

with the notation as in Section 1. For each location, 
s, suppose the correlation of X(s) and Z(s) over 
repeated sampling at the location is p ^ 0, so that 
Z is a confounder. Suppose further that Z is not 
observed and that one models the residual spatial 
structure in the outcome through spatially corre- 
lated random effects, g ~ A/"(0, agR(8 g )) as in (2). 
Finally, suppose that one ignores the correlation be- 
tween g = /3 Z Z and X and integrates over the 
marginal distribution for g, giving (3). Equivalently, 



Yi = Po + f3 x X(si) + e*. The induced correlation be- 
tween X and e* violates the usual regression as- 
sumption that the error is independent of the co- 
variate, leading to bias. From the random effects 
perspective, we have (incorrectly) assumed that the 
random effects are independent of the covariate, a 
key (but often unstated) assumption of mixed effects 
models (Breslow and Clayton, 1993; Diggle et al., 
2002, page 170). 

The treatment of X(s) and Z(s) as random natu- 
rally induces spatial structure. However, in a given 
data set the most plausible repeated sampling frame- 
work may suggest that X and Z reflect spatial struc- 
ture that does not arise from a stochastic data gen- 
erating process. Rather, one might consider A(s) 
and Z(s) to be fixed unknown functions, particularly 
when X and Z vary at large scales, which mimics the 
partial spline/partial linear setting. This also is con- 
sistent with the treatment of large-scale variation in 
the mean term in traditional kriging. Consider the 
case when there is concurvity between the two fixed 
functions, reflected in a nonzero empirical correla- 
tion, p, between X and g = /3 Z Z as calculated over 
the collection of locations (e.g., the concurvity in 
the simulations of Ramsay, Burnett and Krewski, 
2003; He, Mazumdar and Arena, 2006; 

Peng, Dominici and Louis, 2006). In the partial lin- 
ear/partial spline setting it is well known that such 
association between the exposure and the nonpara- 
metric smooth term causes bias (Rice, 1986, Equa- 
tion 28; Speckman, 1988). In any real data set, the 
orthogonality needed for p ~ seems particularly 
unlikely if both X and Z vary at large scale relative 
to the size of the domain (though p < may be as 
much a possibility as p > 0). 

The stochastic generative model is still useful un- 
der this framework of fixed functions because real- 
izations of X(s) and Z(s) give plausible values for 
X and Z that could arise in real applications for 
which there is no reasonable stochastic mechanism. 
I choose to treat X and Z stochastically, and I use 
p to quantify explicitly the strength of association 
between the residual spatial variation and the expo- 
sure. This approach allows for some simple, useful 
analytic results and is further justified in that the 
variation that an unmeasured Z induces in Y is nec- 
essarily treated stochastically as part of the residual 
in actual applications. In some cases I report results 
conditional on X, and in others I also average over 
the stochastic variability in X and over variability 
in the spatial locations of the observations. 
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Since Z represents an unmeasured confounder, I 
assess the inferential properties of fitting a regres- 
sion model by maximizing the marginal likelihood 
(3) using GLS, thereby ignoring correlation between 
the residual and the exposure. I assess bias as a func- 
tion of the spatial scales of X(s) and Z(s), which I 
suppose to be generated as Gaussian processes with 
Matern spatial correlation function 

where d is the Euclidean distance between two lo- 
cations, 8 is the spatial range parameter, and 1C V (-) 
is the modified Bessel function of the second kind, 
whose order is the smoothness parameter, v. I fix 
v = 2, which gives continuous and differ entiable 
Gaussian process realizations. This reflects an as- 
sumption of some smoothness in the spatial pro- 
cesses under consideration, but I also consider re- 
sults based on an exponential correlation function 
(i.e., i/ = 0.5). The model (3) is equivalent to both 
a mixed model and a universal kriging model if one 
knows the variance and spatial dependence parame- 
ters. Furthermore, given the extensive use of penal- 
ized splines in applications, and the connection be- 
tween penalized splines and mixed models 
(Ruppert, Wand and Carroll, 2003), I also consider 
the use of a penalized spline to represent g(s). 

In the nonspatial context, one would generally try 
to adjust for confounding by including relevant co- 
variates as fixed effects; in the spatial context one 
could include spatial regression spline terms. The 
basic question that I explore in the remainder of 
Section 2 is the extent to which inclusion of a spa- 
tial random effect term or a penalized spline can 
adjust for unmeasured spatial confounding, given 
that these approaches do not involve a projection 
in the way that a regression spline does. The ran- 
dom effects and penalized spline approaches do esti- 
mate the residual spatial variation based on a bias- 
variance tradeoff (e.g., Claeskens, Krivobokova and 
Opsomer, 2009), and the penalized spline is a re- 
gression spline in the limit as the penalty goes to 
zero. So it seems plausible that these approaches 
may reduce bias by at least partially adjusting for 
the unmeasured spatial confounder. I will show that 
the spatial scales involved are critical. 

2.3 Bias with Known Parameters 

This section considers bias when I suppose that 
the variance parameters are known and only the re- 
gression coefficients, /3q and /3 X , are unknown. The 



initial results concern the situation when the expo- 
sure, X, and the unmeasured confounder, Z, vary at 
the same spatial scale. I then assess what happens 
when X varies at two scales and one is the same scale 
as the single-scale confounder. Finally, I consider the 
possibility that there is additional variability in the 
outcome at another scale, but uncorrelated with X . 

To start, suppose that X(s) and Z(s) share the 
same spatial correlation range, 9, but may have dif- 
ferent marginal variances, namely, X ~ jV(/x x l, 
<t*R(0)) and Z~jV(/x, e l,ofR(0)) and Cov(X,Z) = 
pa x a z Il(9). Straightforward conditional normal cal- 
culations give 

E(4|X) =0 X + [(A- T S- 1 AT)- 1 A' T S- 1 E(Z|X)/3 Z ] 2 



Px + 



(5) 



(X T Y1~ 1 X) 



1 X T YT 1 X 



' fJ>z 
\ 



p — p z fi x 

&x 



P—Pz 



Px+P—Pz, 
&x 



where X = [1 X], [-^ indicates the second element 
of the 2- vector, and £ = a^'K(6) + r 2 I. The resulting 
bias, p^-P z , is the same as if X and Z were not spa- 
tially structured and is also equal to the bias under 
OLS. This demonstrates that we have not adjusted 
for confounding at all by fitting the model that in- 
cludes spatial structure. As with OLS, the model 
attributes as much of the variability as possible to 
the exposure, rather than to the spatially correlated 
residual term, including all of the variability in Z 
that is related to X. If p = 0, the bias is zero in 
(5). This occurs because we average over stochastic 
variability in Z, so any nonorthogonality between X 
and Z in individual realizations contributes to vari- 
ance rather than bias. This contrasts with the bias 
terms in Rice (1986) and Dominici, McDermott and 
Hastie (2004), which are caused by nonorthogonality 
of the fine-scale variation in X and the nonparamet- 
ric component of the model, since neither is treated 
stochastically. 

Next, I keep the same data-generating and model- 
fitting framework, but explore the situation in which 
the exposure varies at two scales. I suppose that 
X{s) is a multi-scale process and introduce corre- 
lation between Z(s) and one of the components of 
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X(s). Let X = X c + ~K U be decomposed into a com- 
ponent, X c , that is at the same scale as the con- 
founder, Z, and a component at a different scale, 
X u , which is independent of X c and Z. Specifically, 
take Cov(X) = a 2 K(9 c ) + ct 2 R(0„), Cov(Z) = 
o|R(0 c ) and Cov(X, Z) = Cov(X c , Z) = pa c a z K(9 c ). 
After some straightforward algebra and matrix ma- 
nipulations, we have 

E(4|X) = [3 X + \{X T Y?- X X)- X X T 
(6) • E*- 1 E(Z|X)/3,] 2 

where 

fc(X) = [(A- T S*~ 1 A')- 1 A' T I]*- 1 M(X - v x l)} 2 p c , 



Pjaimdc) + -HI 



M = (p c I + (l-p c )R(0 u )R(0 



((1-^)1 + p 2 R(0 c )), 



and p z = f3 2 a 2 /((3 2 a 2 + r 2 ). We see that the bias 
term is proportional to that in the single-scale set- 
ting, multiplied by an additional term, fc(X), that 
modulates the bias. fc(X) necessarily includes an 
extra multiplicative factor, p c = (y 2 c j{p 2 c + cr 2 ), that 
quantifies the magnitude of the confounded compo- 
nent of X relative to the total variation in X. While 
the term &(X) is complicated, we can explore its 
dependence on the spatial scales (9 C and 9 U ) and 
the magnitudes of the variance component ratios (p z 
and p c ) to see how the bias compares to the same- 
scale setting. In the following results I average over 
the variability in X. 

For a grid of n = 100 locations on the unit square, 
Figure 2 shows the average of k(X.) over 1000 simu- 
lations as a function of 9 C and 9 U , for combinations of 
p c and p z , where the empirical average approximates 
the expectation with respect to the distribution of 
X. There is a simple pattern to the bias modifica- 
tion relative to the same-scale setting. For 9 C = 9 U 



0.0 0.2 0.4 0.6 0.8 1.0 




0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 

spatial scale of confounding, 6 C 



Fig. 2. The expected value of the bias modification term, Exfc(X), as a function of the spatial scales of confounded (6 C ) and 
unconfounded (9 U ) variability for a selection of values of p z and p c . fc(X) quantifies the amount of bias relative to the bias 
in the same-scale setting or with nonspatial confounding (po z p z /a x ). Along the diagonal (6 C =9 U ) Exfc(X) =p c , which is 
equivalent to no bias reduction. Values near zero indicate substantial bias reduction. 
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(the diagonal elements on the 1 : 1 line) , we do not 
need simulation: Exk(X.) = p c , which is equivalent 
to the same-scale result (5), after accounting for the 
proportion of variability in X that is confounded, p c . 
Note that if one estimates the analogous bias to (6) 
for OLS applied to spatial data, it is nearly constant 
regardless of the spatial scales [ExA;(X) ~ p c ; not 
shown]. Only when 9 U < C , and particularly when 
Qu^-Qc-, do we see less bias than in the same-scale 
setting, with clear potential for bias reduction from 
modeling the residual spatial variation in the out- 
come (recall Figure 1 to interpret the values of 0). 
Above the diagonal, for 9 U > 9 C , Exk(X) > p c , in- 
dicating more bias when the scale of confounding 
is smaller than the scale of unconfounded variabil- 
ity. This situation may be of limited practical in- 
terest, because it's not clear that there are real ap- 
plications in which the unconfounded variation in 
the exposure occurs at larger scales than the con- 
founded variation. However, it does show that there 
are circumstances in which bias is larger than under 
OLS, a point also made in Hodges and Reich (2010). 
Note that the patterns in Figure 2 are qualitatively 
similar regardless of the values of p c and p z . Quan- 
titatively, for larger values of p c , corresponding to a 
larger proportion of the variation in the exposure be- 
ing confounded variation, bias is larger. For larger 
values of p z , corresponding to a larger proportion 
of the residual variation being the contribution of 
the confounder, the effects of the spatial scales are 
more distinct. The results highlight that inclusion 
of the spatial residual does not give unbiased esti- 
mates and bias is substantial in many scenarios even 
when the covariance parameters are known. Results 
are very similar when I sample locations uniformly 
on the unit square or in a clustered fashion (using a 
Poisson cluster process). 

The results in Figure 2 correct for the compli- 
cation that the sample variance of spatial process 
values (calculated over the domain) decreases as 9 
increases. This occurs because the sample variance 
over the domain in a single spatial replicate under- 
estimates population variability; see Figure l(b)-(c) 
for examples. I want to have fixed ratios of aver- 
age sample variances, p z = j5 2 Ezs 2 / {f3 2 Ezs 2 + r 2 ) £ 
{0.1,0.5,0.9} and p c = E Xc s 2 c /{E Xc s 2 c + E Xu s 2 u ) G 
{0.1,0.5,0.9}, for all values of 9 C and 9 U , thereby 
avoiding the introduction of artifacts caused solely 
by having ratios of sample variances change with the 
spatial ranges. Here s 2 , s 2 and s 2 are the sample 
variances of Z, X c and X u , respectively. To achieve 
this, I generate X c ~ Af(0, d 2 a 2 c B,(9 c )) and X. u ~ 



W(0, d 2 t a 2 'R(9 u )) and modify the calculation of A;(X) 
in (6) accordingly. d c and d u are functions of 9 C and 
9 U , respectively, that are chosen such that Ex c s 2 (6 c ) ~ 
a 2 and Ex u s 2 l (9 u ) « a 2 , where s 2 (9 c ) is the sample 
variance of X c for a given realization under 9 C and 
analogously for s 2 ^). The expectations are taken 
with respect to the distribution of the subscripted 
random vector. These manipulations allow me to 
present bias for scenarios that correspond to spe- 
cific ratios of average sample variability of X c , X u , 
Z and e over the spatial domain. 

To have only a single scale of residual spatial vari- 
ability is not very realistic. Therefore, I carried out 
an additional simulation study with residual spa- 
tial variability in the outcome that is independent 
of the exposure and at a smaller scale than the 
scale of Z(s). I suppose that the data-generating 
model is 

(7) Y = /3 l + /3 x .X + /3,Z + h + e, 

and that h ~ M(0, a 2 H(9 u )), independent of X, Z 
and £, with all of the other details as before. Un- 
der this data-generating model and again supposing 
that all variance parameters are known, simulation 
estimates of Ex^(X) indicate that bias is somewhat 
smaller than that seen in Figure 2 for 9 C > 9 U and 
somewhat larger for 9 C < 9 U (not shown). Note that 
if the additional small-scale variability is correlated 
with the exposure, then one is back in the situation 
of having common scales for the exposure and the 
confounder, which is considered at the beginning of 
this section. 

2.4 Bias and Precision with Estimated 
Parameters 

To generalize the results of Section 2.3, which sup- 
posed known variance and spatial dependence pa- 
rameters, I set up a simulation study to assess the 
impact of estimating those parameters. In addition 
to maximum likelihood estimation of a mixed ef- 
fects/kriging model based on the marginal likeli- 
hood (3), I consider the use of penalized likelihood 
to fit the model (2) with a penalized thin plate 
spline spatial term for ^(s). I implemented the pe- 
nalized spline using gam() in R, which uses general- 
ized cross-validation (GCV) for data-driven smooth- 
ing parameter estimation (Wood, 2006). For the core 
simulations, I set the following parameter values, 
al = a 2 = f3 2 a 2 z = 1, r 2 = 4, /3 X = 0.5, p = 0.3, and 
sample 100 spatial locations uniformly from the unit 
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Fig. 3. Relative bias, (E(/3 X ) — fi x )//3 x , as a function of the spatial scales of confounded (9 C ) and unconfounded variability 
(9u): (a) theoretical bias for the mixed/kriging model with known variance parameters, and (b), (c), (d) simulated bias with 
estimated variance/penalty parameters for (b) the mixed model, (c) a penalized spline model, and (d) OLS. 



square. For a range of values of 9 C and 8 U , I simulate 
2000 data sets for each pair {0 C ,6 U }. For each simu- 
lated data set, I generate new spatial locations and 
new values of X and Z; I then generate Y using (4). 
Again we have to account for the reduced empirical 



spatial variability as 6 increases; these simulations 
have effective values of p c = 0.5 and p z = 0.2. 

With regard to bias, the simulation results for 
the mixed/kriging model [Figure 3(b)] reasonably 
match the theoretical values with known variance 
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Fig. 4. Simulation results for (top row) mixed model/kriging fit and (bottom row) penalized spline model. Each plot shows 
results as a function of the spatial scales of the confounded (8 C ) and unconfounded variability (0 U ), with MSE (first column), 
variance of the estimates over the simulations (second column), average squared standard error (third column) and coverage 
(fourth column). 
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parameters [Figure 3(a)]. However, when 9 U <C 9 C , 
the bias is generally larger than with known vari- 
ance parameters, because the fitted model some- 
times estimates little or no spatial structure in the 
residuals, pushing bias results toward the larger bias 
seen under OLS [Figure 3(d)]. Results for the penal- 
ized spline model [Figure 3(c)] show smaller bias for 
9 U < 9 C than the mixed model, presumably caused 
by the difference between estimating the amount of 
smoothing by GCV compared to maximum likeli- 
hood. In either case, spatial scales are critical, and 
bias is smaller than with OLS only when the scale 
of confounding is larger than the scale of the uncon- 
founded variability. Additional simulations indicate 
that as the correlation of confounder and exposure 
increases, or the magnitude of variation in the con- 
founder increases, or the effect size decreases, rel- 
ative bias increases (not shown). In such scenarios, 
substantial bias reduction occurs only for very small 
spatial scales in the exposure and large scales of con- 
founding. 

Figure 4 compares the mixed model with the pe- 
nalized spline in the context of a bias- variance trade- 
off. There is a substantial bias- variance tradeoff, with 
the smaller bias of the penalized spline model (for 
Qc<@u) trading off for increased variance. The result 
is increased mean squared error (MSE) in {3 X , except 
when 9 U is very small. Both model variance esti- 
mates (third column) understate the variability in 
the coefficient estimates (second column), with par- 
ticular underestimation of uncertainty and low cov- 
erage for the mixed/kriging model, and with lower 
coverage as one moves away from the region of 9 C 3> 
9 U . Of course the bias causes much of the poor cov- 
erage. 

Fitting the mixed/kriging model by restricted max- 
imum likelihood (REML) rather than maximum like- 
lihood produces moderate improvement in cover- 
age, with the average variance estimate more simi- 
lar to the variance of the estimated coefficients. Us- 
ing v = 0.5 (i.e., an exponential spatial correlation 
function) in the fitting rather than the true v = 2 
has little effect on results. However, when I generate 
the unconfounded variability, X u , based on v = 0.5, 
bias is substantially smaller than the core results 
(particularly note that there is reduced bias rela- 
tive to OLS when 9 C = 9 U ), apparently because the 
nondifferentiable sample paths of processes with ex- 
ponential covariance play the role of very fine-scale, 
unconfounded variability. There is little change in 
results when using spatial locations simulated using 



a Poisson cluster process with an average of seven 
children per cluster and cluster kernel standard de- 
viation of 0.03. Finally, simulations with p = 0, that 
is, no confounding, indicate no bias for either model, 
as expected. 

Our bias results when p ^ are analogous to the 
bias seen with penalized spline models in He, Mazum- 
dar and Arena (2006) and Peng, Dominici and Louis 
(2006). There, concurvity (i.e., p ^ 0) between the 
smooth temporal term (analogous to our spatial resid- 
ual) and the exposure emerged from the fixed basis 
coefficients chosen based on empirical data 
examples (R. Peng, personal communication; 
He, Mazumdar and Arena, 2006). Similar results are 
seen in the spatial settings of Ramsay, Burnett and 
Krewski (2003). 

The presence of small-scale independent variation 
in the residual (7) reduces bias for 9 U < 9 C (not 
shown), relative to the results presented above. This 
occurs through an increase in the number of degrees 
of freedom estimated from the data to capture resid- 
ual variability, that is, undersmoothing with respect 
to the variation at the 9 C scale, analogous to under- 
smoothing in the partial spline setting (Rice, 1986; 
Speckman, 1988). This scenario seems quite likely 
in applications: if there is large-scale residual spatial 
structure, there is likely to be finer-scale structure 
as well. Thus, analyses that attempt to best fit the 
data may in the process reduce bias from confound- 
ing at the larger scales. 

2.5 The Bias- Variance Tradeoff 

We have seen that even when all covariance pa- 
rameters are known and the scale of confounding is 
much larger than the scale of unconfounded vari- 
ability in X, bias remains, albeit at a much reduced 
level. In principle, if the structure at the confounded 
scale could be exactly fit using a set of basis 
functions, such as a regression spline (e.g., 
Dominici, McDermott and Hastie, 2004), then the 
exposure effect estimate would be unbiased, as in 
any multiple regression. The partial residual ker- 
nel smoothing approach of Speckman (1988) reduces 
bias in a similar fashion, albeit without using a pro- 
jection, through the technique of twicing. However, 
in a real application, one has to choose the basis 
functions, and if the basis functions do not fully 
explain the confounded, large-scale variability, even 
with a basis of seemingly sufficient dimension, this 
will induce a bias. One could instead consider a 
penalized spline approach with penalty parameter 
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chosen in advance to give the desired effective de- 
grees of freedom (e.d.f.). For fixed e.d.f., since the 
penalized spline smoother is not a true projection 
(Speckman, 1988; Peng, Dominici and Louis, 2006), 
one would expect the penalized spline approach to 
have more bias than the regression spline approach. 
Heuristically, bias in this approach occurs because 
the estimated spatial term does not fully explain the 
confounded component of the variability in the out- 
come, causing a bias analogous to that seen in the 
partial spline setting (Rice, 1986; Speckman, 1988). 
However, we would expect the penalized spline to 
be less sensitive to the exact form of the basis func- 
tions and number and placement of knots, as is seen 
in the example (Section 4). Furthermore, one can 
always undersmooth to reduce the bias, following 
the recommendation in the partial spline literature 
(Rice, 1986; Speckman, 1988). Thus, using a pe- 
nalized spline seems reasonable, albeit without the 
clean interpretation of a projection. I show below 
that simulations comparing regression spline and pe- 
nalized spline models support these theoretical re- 
sults from the literature, in the spatial context con- 
sidered here, with the regression spline having re- 
duced bias and increased variance relative to penal- 
ized modeling. 

The primary issue in an application is choosing 
the amount of smoothing to reduce bias, since in- 
ference about /3 X is the goal rather than best fit- 
ting the data. Data-driven smoothing might reduce 
bias (if there is small-scale residual correlation) or 
might have little effect on bias (if the data suggest 
only large-scale residual correlation). Thus, the re- 
duction in bias will depend on the scales involved 
and the actual amount of smoothing done, and the 
analysis will reveal little about the sensitivity of es- 
timation to scale. Instead, one could explicitly assess 
the bias- variance tradeoff by varying the amount of 
smoothing and assessing the sensitivity of the expo- 
sure effect inference. One approach is a spatial ana- 
logue to the sensitivity analysis approaches of Peng, 
Dominici and Louis (2006): fit a model with spatial 
basis functions and vary the e.d.f. (e.g., Zeger et al., 
2007). Plotting (3 X and uncertainty intervals as a 
function of e.d.f. (or some other metric) provides an 
assessment of the robustness of results to potential 
spatial confounding at various scales. If one is con- 
cerned about confounding at a particular scale, then 
one can report the results for an e.d.f. that would un- 
dersmooth with respect to that scale to reduce bias, 
accepting the tradeoff of increased uncertainty. 



Motivated by this analysis strategy, I set up a sim- 
ulation under the settings of Section 2.4, using a re- 
gression spline (i.e., unpenalized fixed effects) and 
varying the e.d.f. by changing the dimension of the 
basis in gam() in R. Figure 5(a) shows relative bias 
as a function of the spatial scales involved. As be- 
fore, I focus on the results below the 1 : 1 diagonal 
(8 C = 9 U ), as this is the scenario of practical inter- 
est. By choosing a large number of e.d.f., one can 
decrease bias more effectively than when estimating 
the amount of smoothing from the data [i.e., Fig- 
ure 3(c)]. However, with moderate and large scale 
variability, the variance of the estimates in this fixed 
effects model increases dramatically [Figure 5(b)]. 
This causes a concordant increase in the MSE (not 
shown), highlighting the bias- variance tradeoff. In 
contrast, using a penalized spline with fixed e.d.f. 
[fixing the smoothing parameter in gam() in R] shows 
much more stable results. As expected, for a given 
e.d.f. bias is not reduced as much as with a regression 
spline [Figure 5(a)], but there is much less variability 
[Figure 5(b)]. 

A diagnostic approach to understanding whether 
the residual may include variation from an unmea- 
sured confounder is to assess the correlation between 
the residual and the exposure. Not knowing /3 X , one 
might use a variety of plausible values of j3 x to es- 
timate g and then calculate the correlation with X 
(and potentially with filtered versions of X that ex- 
clude small-scale variation). 

2.6 Accounting for Residual Spatial Correlation 

If one accounts for large-scale variation as a means 
of reducing bias, there may still be small-scale resid- 
ual variation, such as fine-scale correlation in health 
outcomes related to residential sorting. As I have 
shown, one can reduce potential confounding bias 
from this fine-scale variation through explicit spatial 
modeling only if there is variability in the exposure 
at an even smaller scale. If there is not, then one 
is effectively assuming that the fine-scale variation 
is uncorrelated with the exposure. Given this as- 
sumption, one may need to account for the fine-scale 
residual spatial variation so that uncertainty esti- 
mation for P x is not compromised (but note the re- 
sults of Section 3.3). One possibility would be to use 
an analysis robust to misspecification of the resid- 
ual variance, for example, using an estimating equa- 
tion with uncertainty based on the sandwich esti- 
mator, with regression spline terms in the mean to 
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account for large-scale spatial confounding bias. Al- 
ternatively, one could fit a penalized model with 
the amount of smoothing determined from the data. 
This has two effects. First, it naturally accounts for 
the effect of the spatial structure on uncertainty 
estimation. Second, in the presence of small-scale 
residual variability, the model will naturally under- 
smooth with respect to large-scale variability that 
may cause confounding, thereby reducing bias from 
confounding at the larger scale, as discussed previ- 
ously. 



3. SPATIALLY CORRELATED RESIDUALS 
AND PRECISION 

In this section I suppose that the residual and the 
exposure are independent (p = in the framework 
of Section 2), which results in unbiased estimation 
of f3 x . I consider effects of spatial scale on the fol- 
lowing questions about efficiency of estimators for 
(3 X (henceforth simply (3) and quantification of un- 
certainty: 

(1) Given a fixed amount of residual variation, how 
is efficiency affected by the proportion of that 
variation that is spatial? 



(a) Bias 
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(b) Variance 
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Fig. 5. Simulation results for relative bias (a) and variance (b) of j3 x as a function of the spatial scales of confounded (9 C ) 
and unconfounded (8 U ) variability for regression splines (top rows) and penalized splines (bottom rows) with 5, 15 and 30 
e.d.f., where the e.d.f. are pre- specified, rather than estimated based on the data. 
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(2) What is the magnitude of the improvement in 
efficiency when accounting for residual spatial 
variation, relative to OLS? 

(3) If one uses the naive estimator for the variance 
of the OLS estimator, /3ols 5 what is the mag- 
nitude of the error in uncertainty estimation 
compared to the correct variance estimator for 

/W 

The first question does not appear to have been 
raised in the literature. With regard to the second, 
while we know that GLS is the most efficient estima- 
tor when the residuals are correlated, here I investi- 
gate the magnitude of this efficiency advantage as a 
function of the spatial scales involved. Regarding the 
third, the conventional wisdom in the statistical and 
applied literature appears to be that not account- 
ing for spatial structure leads to underestimation 
of uncertainty (e.g., Legendre, 1993; Burnett et al., 
2001; Schabenberger and Gotway, 2005, page 324). 
However, I have not seen a formal quantification of 
this underestimation for a regression coefficient, in 
contrast to our understanding of the potentially se- 
vere underestimation of uncertainty for the mean 
of a spatial process (Cressie, 1993, Section 1.3; 
Schabenberger and Gotway, 2005, Section 1.5). 

Note that there are three variance estimators (i.e., 
estimators for the sampling variability of the es- 
timated regression coefficient) under consideration 
here: the true GLS variance estimator, and the true 
and naive OLS variance estimators. When p = 0, 
OLS is unbiased, so it makes sense to consider OLS 
for estimation, provided we adjust the usual OLS 
variance estimator to account for the residual spa- 
tial correlation. While actual applications will likely 
involve more complicated modeling, consideration 
of these questions in this simple setting, and with 
known variance components, helps to understand 
the basic issues. 

3.1 Relationship Between Spatial Scale 
and GLS Efficiency 

Given a fixed amount of residual variation, how 
is efficiency affected by the proportion of variation 
that is spatial? I quantify efficiency in terms of pre- 
cision rather than variance, as this allows for closed 
form derivations. 

Lemma 3.1. Consider the model (3) and sup- 
pose that all parameters are known except /3q and 
j3 = (3 X . The expectation of the precision of /3gls > 



with respect to the sampling distribution of X ; is 
Ex^ar^GLs)" 1 ) 

(8) =^^ftT{£- 1 R(0 x )} 

9 ^ 

l T E~ 1 R(6 ) a .)^" 1 l \ 
l^S-il /' 

where £ = (1 - p g )l + p g R(9 g ), p g = o- 2 g /{a 2 g + r 2 ), 
and the remaining notation follows that in previous 
sections. See the Appendix for the proof. 

Note that the term in parentheses is an effective 
sample size, analogous to n — 1 in the nonspatial 
problem. Here the adjustment is for spatial struc- 
ture in residual and exposure, with the second com- 
ponent in the parentheses analogous to the degree 
of freedom lost for estimating a mean. 

Figure 6(a) shows Monte Carlo estimates of the 
expected precision as a function of 9 X and 6 g , av- 
eraging (8) over 500 sets of n = 100 locations sim- 
ulated uniformly on the unit square. I report the 
expected precision divided by a baseline of ex 2 (n — 
1)/(t 2 + er 2 ), which is the expected precision in the 
nonspatial setting, supposing that the total resid- 
ual variation, r 2 + cr 2 , remains constant. Compared 
to the nonspatial setting, lower precision occurs un- 
less the exposure varies at small spatial scale. When 
the exposure varies at small spatial scale and the 
residual at larger spatial scale, precision can be sub- 
stantially greater than in the nonspatial setting. The 
model is able to account for part of the residual vari- 
ance through the spatial structure, as if the spatial 
structure were an additional covariate to which vari- 
ation in the response is attributed. GLS implicitly 
estimates the process, g in (2), that gives rise to the 
marginalized model (3). This reduces the remain- 
ing "unexplained" residual variability and thereby 
improves efficiency relative to having independent 
errors but equivalent overall residual variability. In 
contrast, when X varies only at large spatial scales, 
then efficiency decreases because of difficulty in dis- 
tinguishing (3X(s) from g(s). Results are similar us- 
ing points on a regular grid or clustered based on a 
Poisson cluster process. 

3.2 Efficiency of GLS and OLS Estimators 

Here I consider how spatial scale affects the rela- 
tive efficiency of spatial and nonspatial estimators, 
comparing the precisions of the OLS and 
GLS estimators. Since the true OLS variance, 
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[(X t X)~ 1 (X t (t 2 I + a 2 g K g )X)(X T X)-% 2 , is a 
complicated function, it is difficult to derive closed 
form expressions for efficiency relative to the GLS 
estimator. Instead I conduct a small simulation study. 
For a regular grid of values of 9 X and 9 g , I carry out 
500 simulations for each pair of values, with n = 100 
observations whose spatial locations are drawn uni- 
formly over the unit square domain. Note that I con- 
sider the ratio of the GLS precision to the OLS pre- 



cision, so the values of a 2 and a 2 + r 2 cancel out of 
the ratio and do not affect the results. 

Figure 6(b) shows the Monte Carlo estimates of 
the expected relative precision, as a function of the 
spatial scales, 9 g and 9 X , and the proportion of the 
residual variability that is spatial. When little of the 
residual variability is spatial (p g = 0.1), there is lit- 
tle gain in precision, as expected. When more is spa- 
tial, the gains in precision are small when g varies 
at a small scale, but substantial when g varies at a 
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Fig. 6. Efficiency and precision results for three values of p g = <jg/(<7 2 + r 2 ) (columns) as a function of the spatial scales 
of the residual (9 g ) and the exposure (0 X ). (a) The log of the expected precision of the GLS estimator (8), relative to the 
expected precision in the nonspatial setting with equivalent total residual variation, (b) Relative efficiency of GLS and OLS 
estimation, quantified as the log of the expected ratio of GLS to OLS precision, (c) The log of the expected ratio of the correct 
and naive OLS variance estimators (9). The results are based on 500 simulations for each set of parameter values, with a 
Matern correlation with v = 2 and 100 locations sampled uniformly over the unit square. 
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large scale. Unfortunately, this is also precisely the 
case in which one would be concerned about spa- 
tial confounding. If we suppose that the large-scale 
structure in the residual has been controlled for in 
an effort to reduce the potential for bias, then with 
the remaining residual variability being fine scale, 
there is limited gain in precision regardless of the 
spatial scale of the exposure. With locations on a 
regular grid, the gains in precision are slightly less 
for small values of 9 g , while with Poisson cluster 
process sampling, the gains are somewhat larger for 
small values of 9 g . See also Dow, Burton and White 
(1982) for similar simulation results when a Markov 
random field structure induces the correlation. 

3.3 Underestimation of Uncertainty by the 
Naive OLS Variance Estimator 

Applied analyses often ignore residual spatial cor- 
relation, raising the question of how strongly uncer- 
tainty estimates are affected. One can express the 
ratio of the true OLS variance to the incorrect naive 
OLS variance as follows. First define W = (X — 
XI) /s where s 2 = ^XX^i ~ x ) 2 . After express- 
ing (3 X = [(_X T X)" 1 X T Y] 2 = (X T X) -1 X T Y where 
X = X — XI, we have 



Var true ( ( 5 :E 



(X T X) 



Var r 



(X r X)(X T SX)" 1 X T X 



X J EX 1 



X T X 



W T SW. 



n 



Averaging over the sampling distribution of X, we 
have 



(9) 



1. 



E x I -W T SW , 

n J 



1 



n 



tr(SCov(W)). 



So for 5] « I or Cov(W) ~ I, that is, when either 9 g 
or X is close to zero, we expect the ratio to be near 
one. Note also that with spatial correlation func- 
tions that are nonnegative, the only negative contri- 
bution to the ratio can be from negative covariances 
induced by standardizing X. Such negative covari- 
ances should diminish as the sample size increases, 
so we expect the ratio to generally be no smaller 
than one, indicating that the naive variance does un- 
derestimate uncertainty. Finally, the largest values 
of the ratio would occur with large positive correla- 
tions in corresponding elements of S and Cov(W), 
which is to be expected when both g and X show 
large-scale variation. 



Figure 6(c) supports these heuristic results, show- 
ing the average ratio of variances in simulations, 
where the simulations are conducted as in Section 3.2. 
The ratio is close to one when either of the spa- 
tial terms has fine-scale variability and far from one 
when both have large-scale behavior. This result is 
similar to that of Bivand (1980) for inference about a 
correlation coefficient and to (Johnston and DiNardo, 
1997, page 178) under serial autocorrelation in a re- 
gression setting. As expected, when the proportion 
of residual variability is smaller (moving from the 
bottom left to bottom right panels), the expected 
ratio gets closer to one. This indicates that when 
nonspatial variation dominates the residual and the 
spatial structure in the residual or exposure is not 
too large in scale, the naive variance estimator may 
be reasonable. A lack of large-scale residual struc- 
ture might result from having accounted for large- 
scale variation in attempting to reduce spatial con- 
founding bias. Results with gridded locations show 
ratios slightly closer to one, and with clustered loca- 
tions, ratios further from one. Note that the uncer- 
tainty estimate in any given naive analysis may be 
larger than when fitting a spatial model because the 
more sophisticated model both corrects the variance 
estimate, which increases the estimated uncertainty, 
and uses a more efficient estimator, which decreases 
the fundamental uncertainty. 

Simple simulations with spatial ranges and sam- 
pling designs specific to an analysis could be easily 
carried out for further guidance in a given setting, 
allowing one to assess whether ignoring the spatial 
structure has substantial impact on uncertainty es- 
timation. Accounting for small-scale spatial correla- 
tion requires estimation of the spatial structure and 
is often computationally burdensome, so an assump- 
tion of independence can have an important practi- 
cal benefit. Of course in some analyses, any under- 
estimation of variability may be cause for concern, 
in which case use of the naive variance estimator 
would not be tenable. 

4. CASE STUDY: BIRTHWEIGHT AND 
AIR POLLUTION 

Chronic health effects of ambient air pollution in 
developed countries involve small relative risks, but 
are of considerable public health importance because 
of widespread exposure. Epidemiologic studies at- 
tempt to estimate a small effect from data with high 
levels of variability and stronger effects from other 
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covariates, including potential confounders such as 
socioeconomic status, so spatial confounding bias is 
of critical concern. 

I reanalyze data on the association between ambi- 
ent air pollution (estimates of black carbon, a com- 
ponent of particulate matter) and birthweight in 
eastern Massachusetts (Zeka, Melly and Schwartz, 
2008; Gryparis et al., 2009). These analyses found 
significant negative effects of traffic proxy variables 
and black carbon, respectively, on birthweight. Gry- 
paris et al. (2009) used several methods to try to 
account for effects of measurement error in the pre- 
dicted black carbon concentrations, which are based 
on a regression model that accounts for spatial and 
temporal structure and key covariates. 

I follow these analyses in using an extensive set 
of covariates to try to account for potential con- 
founding. I use smooth terms for mother's age, ges- 
tational age and mother's cigarette use, to account 
for nonlinearities, a linear term for census tract in- 
come, and categorical variables for the following: 
presence of a health condition of the mother, previ- 
ous preterm birth, previous large birth, sex of baby, 
year of birth, index of prenatal care and mater- 
nal education. The exposure of interest is the esti- 
mated nine-month average black carbon concentra- 
tion at the geocoded address of the mother, based 
on a black carbon prediction model (Gryparis et al., 
2007). Following Gryparis et al. (2009), for simplic- 
ity, I exclude the 13,347 observations with any miss- 
ing covariate values, giving 205,713 births. 

In Gryparis et al. (2009) we found no evidence 
of residual spatial correlation based on a spatial 
semivariogram. Further analysis here indicates that 
there is significant residual spatial variation but that 
nonspatial variation overwhelms the magnitude of 
this variation. Figure 7(a) is a semivariogram show- 
ing no evidence of spatial structure, while a spatial 
smooth of model residuals [Figure 7(b)] indicates 
clear spatial structure. While individual nonspatial 
variability among babies swamps the spatial varia- 
tion (hence the flat semivariogram), it is large rela- 
tive to the estimated pollution effect (note the sur- 
face values in the range of —40 to 40, for comparison 
with effect estimates in Figure 8). Thus, if the resid- 
ual spatial variation is caused by spatially varying 
confounders, it could bias estimation of the pollu- 
tion effect. 

To include a spatial term in models of birthweight, 
I consider a regression spline, an unpenalized ap- 
proach, and a penalized spline, both with e.d.f. cho- 



sen in advance (see Section 2.5), as well as a pe- 
nalized spline with data-driven smoothing parame- 
ter estimation based on GCV, all implemented in 
gam() in R, using the thin plate spline basis. Note 
that the thin plate regression spline approach im- 
plemented in gam() should minimize sensitivity to 
knot placement (Wood, 2006). 

I first add a spatial term to the model with the full 
set of covariates to assess whether some of the esti- 
mated effect may be biased by spatial confounding. 
Figure 8(a) shows how the estimated effect of black 
carbon varies with the e.d.f. and the spatial smooth- 
ing approach. The estimate attenuates somewhat as 
more e.d.f. are used to account for the spatial struc- 
ture. For the penalized spline, as more than about 
10 e.d.f. are used, the upper confidence limit ex- 
ceeds zero, and for larger e.d.f., the upper limit in- 
creases further. GCV chooses 157 e.d.f., indicating 
fairly small-scale spatial structure in the data. For 
context note that with 129 e.d.f. in Figure 7(b) we 
see spatial features at the scale of individual towns. 
While the regression spline approach implemented 
here avoids having to choose the knots, the empir- 
ical results are still very sensitive to e.d.f., in con- 
trast to the stability of the penalized spline solu- 
tion as the e.d.f. varies. For both penalized and re- 
gression splines, there is a clear bias-variance trade- 
off, with increasing variance as the number of e.d.f. 
increases. However, for this problem with a very 
large sample size, the confidence intervals do not 
increase drastically, nor is there much difference in 
the uncertainty between the regression and penal- 
ized spline approaches. The spatial confounding as- 
sessment suggests that while we have somewhat re- 
duced confidence in the black carbon effect, the ef- 
fect estimate is reasonably stable even when using 
a spatial term with a large number of degrees of 
freedom. 

Next I consider what might have happened if most 
of the covariates (particularly the ones related to so- 
cioeconomic status) were not measured, potentially 
inducing serious confounding. Figure 8(b) indicates 
that without any spatial term in the model, the ef- 
fect estimate is —23.0 with a 95% confidence interval 
of (—26.8, —19.2), indicating a much more substan- 
tial effect of black carbon than the fully adjusted 
model. As soon as one accounts for spatial struc- 
ture, even with a small number of e.d.f., the esti- 
mate attenuates, approaching the fully adjusted es- 
timate, with the upper confidence limit rising above 
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(a) Semivariogram (b) Spatially-smoothed Residuals 




10 20 30 40 
distance (km) 



Fig. 7. (a) Semivariogram of full model residuals, with the first point representing births to mothers living at the same 
location, (b) Spatial smooth of residuals with town boundaries in grey. The spatial smooth, with 129 e.d.f. chosen by GCV, is 
highly significant. 



zero. The reduced model appears to suffer from seri- 
ous confounding, with the estimated pollution effect 
apparently driven by large-scale association of pol- 
lution and birthweight. The spatial analysis is able 
to account for much of this apparent confounding, 
substituting for a rich set of covariates. 

Ideally one would fit a model that accounts for 
fine-scale spatial structure to improve one's confi- 
dence in the uncertainty estimation. However, with 
205,713 observations, this is a computational chal- 
lenge that I do not take up here. Given the results 
in Section 3.3 that indicate that large-scale struc- 
ture causes most of the variance underestimation, 



one can hope that the uncertainty at the larger val- 
ues for the spatial e.d.f. in Figure 8 may reasonably 
approximate the true uncertainty. 

5. DISCUSSION 

Considerations of scale are critical in spatial re- 
gression problems. Standard spatial regression models, 
which use spatial random effects, kriging specifica- 
tions or a penalized spline to represent the spatial 
structure, are penalized models with inherent bias- 
variance tradeoffs in estimating the smooth func- 
tion. Under unmeasured spatial confounding, the 
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Fig. 8. For the model with the full set of covariates (a) and the reduced set of covariates (b), black carbon effect estimates 
and 95% confidence intervals based on different specifications for the spatial term in an additive model: black pluses indicate 
the model with no spatial term and green dots with the e.d.f. chosen by GCV, while black (regression spline) and red (penalized 
spline) dots indicate results when fixing the degrees of freedom at a set of discrete values. The lines through the points and 
corresponding dashed lines are taken by connecting the effect estimate and confidence interval bounds for the discrete set. 
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bias carries over into estimating the coefficient for 
the exposure of interest, but the degree of bias de- 
pends on the spatial scales involved. Inclusion of 
a spatial residual term accounts for spatial corre- 
lation in the sense of reducing bias from unmea- 
sured spatial confounders only when there is uncon- 
founded variability in the exposure at a scale smaller 
than the scale of the confounding. If the variation 
in exposure is solely at large scales, there is lit- 
tle opportunity to reduce spatial confounding bias, 
but with a component of small-scale exposure vari- 
ability, large-scale spatial confounding bias can be 
reduced substantially. Accounting for large-scale 
residual correlation is also important for improving 
precision of regression estimators and for correctly 
estimating uncertainty. In contrast, when residual 
correlation occurs at small scales, there is little op- 
portunity for reducing spatial confounding bias at 
those scales or improving regression estimator pre- 
cision. However, under the assumption of no small- 
scale confounding, fitting such residual structure can 
reduce bias from larger scale confounding by caus- 
ing undersmoothing with respect to the large-scale 
structure. While the results here are limited to the 
simple setting of linear regression/additive models 
with a single covariate and single unmeasured con- 
founder, I expect that the qualitative results and 
principles hold in more complicated settings, with 
no reason to believe that the bias results would im- 
prove in more complicated models. 

Sensitivity analyses that show the bias-variance 
tradeoff function of the scale at which 

the spatial residual structure is modeled 
(Peng, Dominici and Louis, 2006; Zeger et al., 2007) 
offer one approach that helps to frame the issue of 
bias in the context of the spatial scales involved. In 
choosing a spline formulation to carry out such an 
analysis, while a regression spline has an appealing 
interpretation and in theory result in less bias in 
estimating the effect of interest, a penalized spline 
with a fixed effective degrees of freedom may give 
more stable results. Of course the sensitivity anal- 
ysis approach does not answer the question of how 
to get a single estimate of the effect of interest. One 
might also consider an approach similar to that of 
Beelen et al. (2007) and explicitly decompose the 
exposure into multiple scales, including exposure at 
each scale as a separate covariate and focus causal 
interpretation on the effect estimates for the smaller 
scales (e.g., Janes, Dominici and Zeger, 2007). Lu 
and Zeger (2007) use matching estimators for each 



pair of observations and assess how effect estimates 
vary with spatial lag between the pairs to assess sen- 
sitivity. Note that estimating equation approaches 
are not capable of reducing bias from unmeasured 
spatial confounding because the marginal variance is 
assumed to be unrelated to the exposure and varia- 
tion is not attributed to a spatial term. 

From the econometric perspective, spatial 
confounding bias might be seen as a type of endo- 
geneity bias, with exposure the endogenous variable 
and the unconfounded component of exposure, or 
some proxy for it, an exogenous variable. Since the 
unconfounded component is not measured directly, 
some sort of scale decomposition appears necessary. 
Standard endogenous variable techniques such as 
two-stage least squares and instrumental variable 
methods (Johnston and DiNardo, 1997) do not ap- 
pear directly useful but do share commonalities with 
approaches mentioned above. 

Others have noted the identifiability problems in 
spatial models, with sensitivity of effect estimates to 
inclusion of a spatial residual term when the 
covariates vary spatially (Breslow and Clayton, 
1993; Clayton, Bernardinelli and Montomoli, 1993; 
Burden et al., 2005; Lawson, 2006, page 187; 
Augustin et al., 2007; Wakefield, 2007). A different 
methodologic perspective than that presented here 
has been taken by Reich, Hodges and Zadnik (2006) 
and Houseman, Coull and Shine (2006), who esti- 
mate the effect of exposure, X, by forcing the spa- 
tial residual to be orthogonal to X, attributing as 
much variability as possible to X. This approach 
makes a very strong assumption of no confound- 
ing to avoid over adjustment bias from accidentally 
accounting for some of the effect of the covariate 
in the residual. Note that the residuals and covari- 
ates are not orthogonal under GLS estimation (Sch- 
abenberger and Gotway, 2005, page 349). Gustafson 
and Greenland (2006) confront a similar problem of 
modeling systematic residual confounding in a con- 
text with identifiability problems, finding that im- 
posing structure through a prior distribution in a 
nonidentified model can help account for a portion 
of the confounding, improving bias and precision of 
estimators. 

Note that measurement error in the exposure is of 
critical concern, because reducing bias relies on esti- 
mating variability in exposure at scales smaller than 
the confounding. In many contexts, measurement er- 
ror becomes an increasing concern at small scales 
because of limitations in measurement resources. In 
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contrast, large-scale exposure variation may be well 
estimated using spatial smoothing and regression 
models, thereby inducing Berkson-type error 
through what is effectively regression calibration 
(Gryparis et al., 2009). To the extent to which ac- 
counting for bias forces one to rely on exposure es- 
timates more likely contaminated by classical mea- 
surement error, one may find oneself reducing bias 
from confounding only to increase it from measure- 
ment error. To the extent small-scale variation is 
affected by Berkson error, one would increase vari- 
ance but not incur bias by relying on the small-scale 
variation. 

Finally note that in many settings one has aggre- 
gated exposure and outcome data, so one has limited 
ability to identify effects of exposure based on fine- 
scale variation because the aggregation eliminates 
the fine-scale variation (e.g., Janes, Dominici and 
Zeger, 2007). This suggests that accounting for spa- 
tial confounding with areal data, for which 
researchers often use standard conditional 
auto-regressive models, is likely to be ineffective 
when aggregating over large areal units, which is 
consistent with the bias seen in Richardson (2003). 
In work concurrent with that presented here, Hodges 
and Reich (2010) have investigated bias in the areal 
setting under a variety of perspectives on the spa- 
tial random effects, also making the case for the ap- 
proach taken in Reich, Hodges and Zadnik (2006). 

APPENDIX: PROOF OF LEMMA 3.1 

From the definition of the GLS estimator, we have 

Var(/3 GLS ) = \X T Tr l XY 2 l 

\ T YT X \ 

~ 1 T E- 1 1X T E- 1 X-X T E- 1 11 T S- 1 X' 

Using the definitions of E and p g , and taking the 
reciprocal, we have 

Prec(/3 GLS ) = -^—^ (x T E^X 

Conclude by taking the expectation with respect to 
the sampling distribution of X, using the expecta- 
tion of a quadratic form, and rearranging the matri- 
ces inside the second trace to give a scalar: 

Ex(Prec(&5Ls)) 



tr(S- 1 ll T E- 1 R(6' :r )) 

~^fT^{ { {x)) iTE=*I J' 
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